Point-Based Bounded Policy Iteration for Decentralized POMDPs
نویسندگان
چکیده
We present a memory-bounded approximate algorithm for solving infinite-horizon decentralized partially observable Markov decision processes (DEC-POMDPs). In particular, we improve upon the bounded policy iteration (BPI) approach, which searches for a locally optimal stochastic finite state controller, by accompanying reachability analysis on controller nodes. As a result, the algorithm has different optimization criteria for the reachable and the unreachable nodes, and it is more effective in the search for an optimal policy. Through experiments on benchmark problems, we show that our algorithm is competitive to the recent nonlinear optimization approach, both in the solution time and the policy quality.
منابع مشابه
Bounded Policy Iteration for Decentralized POMDPs
We present a bounded policy iteration algorithm for infinite-horizon decentralized POMDPs. Policies are represented as joint stochastic finite-state controllers, which consist of a local controller for each agent. We also let a joint controller include a correlation device that allows the agents to correlate their behavior without exchanging information during execution, and show that this lead...
متن کاملGeneralized and bounded policy iteration for finitely-nested interactive POMDPs: scaling up
Policy iteration algorithms for partially observable Markov decision processes (POMDP) offer the benefits of quick convergence and the ability to operate directly on the solution, which usually takes the form of a finite state controller. However, the controller tends to grow quickly in size across iterations due to which its evaluation and improvement become costly. Bounded policy iteration pr...
متن کاملGeneralized and Bounded Policy Iteration for Interactive POMDPs
Policy iteration algorithms for solving partially observable Markov decision processes (POMDP) offer the benefits of quicker convergence and the ability to operate directly on the policy, which usually takes the form of a finite state controller. However, the controller tends to grow quickly in size across iterations due to which its evaluation and improvement become costly. Bounded policy iter...
متن کاملRollout Sampling Policy Iteration for Decentralized POMDPs
We present decentralized rollout sampling policy iteration (DecRSPI)–a new algorithm for multiagent decision problems formalized as DECPOMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte-Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout...
متن کاملHeuristic Policy Iteration for Infinite-Horizon Decentralized POMDPs
Decentralized POMDPs (DEC-POMDPs) offer a rich model for planning under uncertainty in multiagent settings. Improving the scalability of solution techniques is an important challenge. While an optimal algorithm has been developed for infinitehorizon DEC-POMDPs, it often requires an intractable amount of time and memory. To address this problem, we present a heuristic version of this algorithm. ...
متن کامل